Skip to main content

School of Politics & International Relations

  • Home
  • People
    • Head of School/Centres
    • Academics
    • Visitors
    • Current HDR students
    • Graduated HDR students
    • Associates
  • Events
    • Event series
    • Conferences
      • Past conferences
    • Past events
  • News
  • Study with us
    • Undergraduate programs
    • Honours program
    • Higher Degree by Research
    • SPIR summer/winter courses
  • Research
    • Publications
    • Research projects
      • Electoral Surveys
        • ANUpoll
        • Australian Election Study
        • World Values Survey
      • Gender Research
        • A history of the Women’s Electoral Lobby
        • Gender-Focused Parliamentary Institutions Research Network
        • Gender and Feminism in the Social Sciences
        • Mapping the Australian Women's Movement
          • Project Structure
          • Project Team
          • Publications
          • AWM Events
          • Institutional Legacy
          • Online Communities
          • AWM Evolution
          • Contact
      • Atrocity Forecasting Project
        • The Forecasts
        • Personnel
        • Publications
      • Human Rights
        • UN Human Rights Agreements
          • Access the data
      • Interpretation, Method and Critique
  • Contact us

Centres

  • Australian Centre for Federalism
  • The Australian Politics Studies Centre

Related Sites

  • ANU College of Arts & Social Sciences
  • Research School of Humanities and the Arts
  • Research School of Social Sciences
  • Australian National Internships Program

Australian Centre for Federalism

Australian Politics Studies Centre

School of Politics & International Relations

Related sites

Related sites

Administrator

Breadcrumb

HomeUpcoming EventsMore Than Unigrams Can Say: Detecting Meaningful Multi-word Expressions From Political Texts
More than Unigrams Can Say: Detecting Meaningful Multi-word Expressions from Political Texts

Almost universal among existing approaches to text mining is the adoption of the bag of words approach, counting each word as a feature without regard to grammar or order. This approach remains extremely useful despite being an obviously inaccurate model of how observed words are generated in natural language.  Many substantively meaningful textual features, however, occur not as unigram words but rather as multi-word expressions (MWEs): pairs of words or phrases that together form a single conceptual entity whose meaning is distinct from its individual elements.  Here we present a new model for detecting meaningful multi-word expressions, based on the novel application of a statistical method for detecting variable-length term collocations.  Combined with frequency and part-of-speech filtering, we show how to detect meaningful MWEs with an application to public policy, political economy, and law.  We extract and validate a dictionary of meaningful collocations from three large corpora totalling over 1 billion words, drawn from political manifestos, legislative floor debates, and US federal and Supreme court briefs.  Applying the collocations to replicate published studies using unigrams only applied to each field, we demonstrate that using collocations can improve accuracy and validity over the standard unigram bag of words model.

About the presenter:

Professor Kenneth Benoit is a world-leading scholar in the areas of quantitative text analysis methodology, party competition, legislative politics, and electoral systems. He has held positions at Trinity College Dublin and is also currently Professor of Quantitative Social Research Methods at the London School of Economics and Political Science.

Ken’s research focuses on automated, quantitative methods of processing large amounts of textual and other forms of big data – mainly political texts and social media – and the methodology of text mining. He is the creator and co-author of several popular R packages for text analysis, including quanteda, spacyr, and readtext. He has published extensively on applications of measurement and the analysis of text as data in political science, including machine learning methods and text coding through crowd-sourcing, an approach that combines statistical scaling with the qualitative power of thousands of human coders working in tandem on small coding tasks.

He received his PhD in Government with a specialisation in statistical methodology from Harvard University.

 

Date & time

  • Thu 27 Feb 2020, 12:00 pm - 2:00 pm

Location

LJ Hume Centre, Copland Building, ANU

Speakers

  • Prof Ken Benoit

Event Series

School of Politics and International Relations Seminar Series

Contact

  •  Intifar Chowdhury
     Send email
     +61 2 6125 6785